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Abstract. Bayesian approach to inverse problems is studied in the 
case where the forward map is a linear hypoelliptic pseudodifferential 
operator and measurement error is additive white Gaussian noise. The 
measurement model for an unknown Gaussian random variable U{x,uj) 
is 

Msiy,uj) = A{U(x,Lj)) + SS{y,uj), 

where A is a finitely many orders smoothing linear hypoelliptic operator 
and (5 > 0 is the noise magnitude. The covariance operator Cu of U is 
smoothing of order 2r, self-adjoint, injective and elliptic pseudodifferen¬ 
tial operator. 

If £ was taking values in then in Gaussian case solving the con¬ 
ditional mean (and maximum a posteriori) estimate is linked to solving 
the minimisation problem 

Ts{ms) = argmin {IjAu — ms \\'^2 + }. 

uGH’- 

However, Gaussian white noise does not take values in but in H~‘ 
where s > 0 is big enough. A modification of the above approach to 
solve the inverse problem is presented, covering the case of white Gauss¬ 
ian measurement noise. Furthermore, the convergence of conditional 
mean estimate to the correct solution as <5 0 is proven in appropriate 

function spaces using microlocal analysis. Also the frequentist posterior 
contractions rates are studied. 

Keywords: Posterior consistency, convergence rate, Bayesian inverse prob¬ 
lem, white noise, pseudodifferential operator 
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1. Introduction 


Practical inverse problems arise from the need to extract information from 
indirect data. For example, consider a device designed for measuring point 
values of a physical quantity u{x). Technological imperfections cause the 
values of u at nearby points to merge together in the measurement. Math¬ 
ematically this corresponds to convolution * u by a point spread function 
<h. The inverse problem is to recover the function u approximately from a 
finite number of point values of ‘h * tt corrupted by random white noise. 

Computational inversion requires a finite representation of the quantity 
u{x). In this paper we promote the view that it is a good idea to design a 
continuous model for u which then can be discretized with a desired number 
of degrees of freedom. This paves the way for the analysis of convergence 
as the discretization becomes finer. Such convergence enables switching 
between different discretizations consistently; this is crucial for multigrid 
methods and for certain parameter-choice strategies. 

In ill-posed inverse problems the measurement data alone is not sufficient 
for noise-robust recovery of the quantity of interest. For instance, Fourier 
transforming <1> * u gives d>{t, so frequency-domain information is lost in areas 
where is very close to zero. Therefore, successful computational inversion 
requires some a priori information in addition to the measurement data. 

Practical inversion is all about combining measurement data and a priori 
information in a noise-robust way. The classical approach to do this is reg¬ 
ularization that assumes that the noise is deterministic and small in norm. 
Regularization involves defining a family of continuous maps, parametrized 
by the norm || noise from the data space to the space of unknown quan¬ 
tities. This must be done so that as || noise || L2 0, the reconstruction 

approaches the true solution along a stable path. This methodology was 
originated by Tikhonov [531 [53]. Both continuous and discrete cases have 
been studied in depth in (9] [TU EH SQ] [38] EH [37] . 

There is a serious drawback in the above noise model in the continuous 
limit. Namely, continuous white noise on is not square integrable. We 
discuss this in detail below in Section |3.1[ The goal of this paper is to use 


Bayesian inversion to construct a consistent continuous-discrete framework 
covering the case of white noise. 

Bayesian inversion is a flexible framework for combining measurement 
data and a priori information in the form of a posterior distribution liEni 
[22l l33l [50] . Computational exploration of the finite-dimensional posterior 
distribution yields useful estimates of the quantity of interest and enables 
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uncertainty quantification. Furthermore, analytic results about the contin¬ 
uous model can then be restricted to a given resolution in a discretization- 
invariant way. 

Our approach to Bayesian inversion follows a general strategy of compu¬ 
tational mathematics: we consider a continuous model which can be dis¬ 
cretized for any practical setting. 

We study the following continuous model for indirect measurements: 


( 1 . 1 ) 


M5 = AU + S6, 


where the random variables (data) and U (quantity of interest) take 
values in the Sobolev spaces H~^{N) and H'^{N), respectively. Here N 
is a d-dimensional compact manifold e.g. a torus corresponding to a d- 
dimensional cube with opposite sides glued together. The real parameter r 
is related to our a priori information about the smoothness of the unknown 
quantity of interest. 


The measurement operator A in our model (1.1) is quite general: we 


assume it to be a finitely smoothing, injective hypoelliptic pseudodifferential 
operator (TDO). See section for precise definition. This class includes 
convolution operators with finitely smooth kernel One example of an 
operator that is hypoelliptic but not elliptic is the heat operator. For more 
examples of hypoelliptic operators see Appendix The measurement noise 
£ is assumed to be normalised white Gaussian noise with mean zero and 
unit variance, and d > 0 models the noise amplitude. 

We model practical measurement data by 


(1.2) Mk = Pk{AU) + Pk{£)5. 


Here P^ is a linear operator related to measurement device; we assume that 
Pfc is an orthogonal projection with /c-dimensional range. We discretize the 
unknown U using some computationally feasible approximation of the form 
Un = SnU. Now we can study an inverse problem 

(1.3) given a realisation of M^, estimate Un- 


We are interested to know what happens to the approximated solutions 
of (1.1) when h —)• 0. The analysis of small noise limit, also known as 
the theory of posterior consistency, has attracted a lot of interest in the 
last decade. Posterior convergence rates were first studied in miiH] and 
further in papers [H [21 [71HH [HI |25l [26l |28l HU HSl |5T1 EH] . However, much 
remains to be done. Developing a comprehensive theory is important since 
posterior consistency justifies the use of the Bayesian approach the same 
way as convergence results justify the use of regularisation techniques. 

In the above mentioned papers the problem is studied from the frequentist 
point of view, that is, the data is thought to be generated by a fixed ’true’ 
solution rI instead of random draw U{oj) from the prior distribution. This 
means that all the randomness in M comes from the randomness of the 
noise £. The interest is then on the contraction of the posterior distrib ution 
around the ’true’ solution as the noise goes to zero, see Subsection 2.1.2 


and Theorem]^ The main emphasis of this paper, however, is in the purely 
Bayesian approach where it is assumed that also U is random. Since U and 
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£ are assumed independent we can write 

(1.4) Ms{u) = AU{ui) + £{u2)5, 

where uj = (a;i,ti; 2 ) G Hi x 02. In Bayesian case the posterior distribution 
is a function of (a;i,a; 2 ). Also the probability measure dP can be written in 
the following form 

dP = dPi(a;i)fiP2(w2). 

We will denote the expected value over the joint distribution of U and £ by 
E. The expected value over the noise is defined by 

(1.5) EuiF{oJi,aj2)) = [ F{uJi,ui2)dF2ioJ2)- 

J ^2 

Our paper provides a conceptual advantage over much of the existing lit¬ 
erature. In many earlier studies A and Cu are perturbations of negative 
powers of operator (/ — A). Our assumption, formulated in terms of hypoel- 
liptic operators, means roughly speaking that the measurement operator A 
and the covariance operator Cu do not need to have a common basis in their 
singular value decomposition. 

The rest of this paper is organised as follows. In section we will intro¬ 
duce the Bayesian setting we are using and present our main result about 
convergence rates. In section we take a closer look to the generalised 
Gaussian random variables in Sobolev spaces and introduce so called white 
noise paradox. We will also show that the distribution U takes values in 
H'^, where r is related to the smoothness of the solution and depends on the 
dimension of the space and the covariance of the prior. In section]^ we will 
introduce hypoelliptic operators and prove Theorems and In section 
we characterise credible sets and frequentist confidence regions and present 
and prove two theorems about the contraction of them. In Appendix we 
will give examples of some hypoelliptic operators and in Appendix we give 
a computational example. 


Notations. 

gm 

^,m 


\T/"* 

P 




m,mo 

P 


Tthq^hiC 


Class of pseudodifferential symbols of order m. See Defini¬ 
tion [H 

Space of pseudodifferential operators (TDO) of order m. 
See Definition [2j 

Space of hypoelliptic TDOs of type (m, mo). See Definition 

13 

Space of TDOs of order m depending on spectral variable 
with order p. See Definition 

Space of hypoelliptic TDOs of type (m, mo) depending on 
spectral variable with order p. See Definition |5j 
The trace of the operator C : ^ See ^.8[). 


2. Convergence results 
Let us return to our indirect measurement problem 

Ms = AU + £6, 
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where we model U = U{x,lo), Ms = Ms{y,u) and S = S{y,uj) as random 
functions. Here w G H is an element of a complete probability space (0, S, P) 
and X and y denote the variables in domains of Euclidean spaces. The reason 
why we model also U as a random variable is that even though the unknown 
quantity is assumed to be deterministic we have only incomplete data of it. 
All information available about U before performing the measurements is 
included in a prior distribution that is independent of the measurement. 

The Bayesian inversion theory is based on the Bayes formula. To solve 


the inverse problem (1.3) we have to express available a priori information 


of Un in the form of a prior distribution iTpr in an n-dimensional subspace. 
Let Mfc and £k be random vectors taking values in and denote their dis¬ 
tributions by and , respectively. The solution of the inverse problem 
after performing the measurements is the posterior distribution of the un¬ 
known random variable. Given a realisation of the discrete measurement 
the posterior density for Un taking values in the n-dimensional subspace is 
given by the Bayes formula 


7r(u| m^) = 

( 2 . 1 ) 


7rpr(u)7r£:j^(m5 | u) 


= CTT. 


pryu) exp ( - 


1 


— Au 


u G 


G 


where A = Pj^ASn is a k x n matrix approximation to the operator A. 

An approximated solution for the inverse problem is often given as a point 


estimate for (2.1). Let us assume that also Un has Gaussian distribution. 
The maximum a posteriori (MAP) estimate —)> M” is defined by 


( 2 . 2 ) 


-iMAP 


{Mk{oj)) := arg max 7r(u | Mk{oj)). 


Note that the MAP estimate depends on oj through the realisation of the 
noise Tfc(a;) and unknown Un{co). When Un is Gaussian distributed the MAP 
estimate coincides almost surely with the conditional mean estimate (GM) 

(2.3) Tf^{Mk{u))=E{Un\Mk){io) a.s. 


where is the u-algebra generated by M^. 

Since in our case GM=MAP a.s. we will consider below the MAP es¬ 
timate. Let us denote the covariance matrix of Un by C{/„. Solving the 


maximisation problem (2.2) with a fixed realisation of noise and unknown 
corresponds to solving the minimisation problem 


(2.4) 


r 5 (m 5 ) = arg min {^||Au - 
uSK" ^ 2o^ 


l2 + 


bcr,''"uii 

2 II Un 


i}- 


Constructing Sn and iVpr is the core difficulty in Bayesian inversion. In 
many inverse problems there is no natural discretisation for the continuum 
quantity U, so n can be freely chosen. Consequently, Sn and -Kpr should in 
principle be described for all re > 0. This raises the following questions: do 
the chosen Sn and Ppr represent the same a priori knowledge consistently at 
all resolutions re > 0? Does the estimate Ts{nis) converge as re —)> oo? See 
e.g. [29l|3l] Also, the number of data points may change, for example due to 
an updated measurement device. The aim of this paper is to build a rigorous 
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theory that allows us to connect discrete models to their infinite-dimensional 
limit models in a consistent way. 

We achieve consistent representation of a priori knowledge by construct¬ 
ing the prior distribution for U in the inhnite-dimensional space X. Then 
the random variable Un = SnU takes values in the finite-dimensional sub¬ 
space Xn C X and represents approximately the same a priori knowledge 
as U. The same way we construct distributions for M and £ in the infinite¬ 
dimensional space Y in which case the random variables Mk and £k take 
values in the finite-dimensional subspace Yk <ZY. 

under 


oo, 


The finite-dimensional problem (2.4) T-converges as n,k 
certain assumptions (including that m should be an L^-function), to the 
following infinite-dimensional minimisation problem in a Sobolev space 


1 


1 , 


(2.5) argmin{^||m5 - Au||^2(^) ^WCu^^'^uWhi^N)}■ 


ueH^ 


252 


— 1/2 — 1/2 

Above Cjj ' G that is, Cjj ' is —r orders smoothing pseudodifferential 


operator. See |23j for a proof. If we are thinking the above as a MAP 
estimate to a Bayesian problem we have to assume that U has formally the 
following distribution 


TTjjriu) = Cexp 
formally 




llL2(Ar) 


Formula (2.5) only makes sense if the noise is square integrable. Even though 


\l2 < OO 

with any A: G N the limit, when /c —)• oo, is inhnity. We will return to this so 


called ‘white noise paradox’ in section 3.1 


2.1. Main result. Let us now modify formula (2.5) to arrive at something 
useful for white Gaussian noise. When e £ L'^ we can write 

(2.6) \\ms - = ||Au||^ 2 (^) - 2{ms, Au)i 2 (^) -F \\ms\\l 2 ^j^y 

Now omitting the inhnite ‘constant term’ ^ minimi¬ 

sation problem which is well-defined also when m is not an function 


(2.7) 


Ts{ms) := argmin {||Au||^ 2 (jv') “ 2{ms,Au) + 6'^\\Cjj^^‘^u\\‘i}, 

ueH^{N) 


where {ms, Au) is interpreted as a suitable duality pairing instead of L‘^{N) 
inner product. When A G T”*, t > —r -|- s, we can define {ms,Au) = 
(m^, Note that the forward operator A, the prior dis¬ 

tribution and the noise depend on on each other only through assumption 
t > —T + s. 


It is well-known that the solution of the finite-dimensional problem (2.4) 
can be calculated using the following formula: 

(2.8) Tsiuis) = (A^A + 6^C^l)-^A^ins. 

We can write the approximated solution us '■= Ts{ms) of the continuous 
problem (2.7) by 

(2.9) 


Tsims) = {A* A + 6^C^Y"A*ms. 
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Before the main result of the paper we will study a simple example to 
give a reader an insight to Bayesian settings. 

Example 1. Let be a 1-dimensional torus T^. We are interested of the 
inverse problem 

Ms = AU + 58 

where we assume that £ ~ A^(0, 1) and U ~ A^(0, /), that is both the noise 
and the unknown are assumed to be normalised white Gaussian noise, see 
section for rigorous definition. The white noise takes values in H'^ with 
some r < —1/2. On the other hand white noise has formally the following 
distribution 

= cexp ( - J||n||^ 2 (Ti) )• 

jormaLLy \ Z ^ / 

Hence we want to solve 

argmin{||Hu ||^2 - 2{ms,Au) + 5^||M||i2(Tri)}- 

uGL'^ 

Note that we are looking for a solution in L^(T^) even though the realisations 
of U are in L^(T^) with probability zero. In general if we are interested in 
finding a solution in then we can show that the prior should take 

values in where t = r — s, see section]^ 

2.1.1. Convergence results in Bayesian setting. We will now formulate the 
main theorem of this paper about the convergence of the continuous solution 
). The precise definitions are discussed in more detail in section]^ and 
Theoremis proved in section]^ 

Theorem 1. Let r,s £ [0, oo) and N be a d-dimensional closed manifold. 
Let U{x,uj) be a generalised Gaussian random function taking values in 
T = r — s, with zero mean and covariance operator Cjj. Assume 
that the operator Cjj is a self-adjoint, injective and elliptic pseudodifferential 
operator ('^DO) of order —2r. Let £{y,u)) be white Gaussian noise on N. 
Consider the measurement 

Ms{y,uj) = A{U{-,uj))-\-d£{y,uj), u £ £1, 

where A £ , t > max{0, —r-|-s} and t < to < 2t-\-r, is a hypoelliptic 

pseudodifferential operator on the manifold N and A : L?‘{N) —)• Lf{N) is 
injective. Above 5 £ M+ is the noise level and £ takes values in H~^{N) 
with some s > d/2. 

Take ( < t — 3(fo — t). Then we have the following convergence 

(2.10) ¥.\\Us{u}) — U{uj)\\fjc ^ as 6 ^ 0. 

The expectation is taken with respect to the joint distribution of {U,£). We 
have the following estimates for the speed of convergence: 

(i) If C ~ s — 2to then there is such C > 0 independent of 6 that 

2t — tQ+r 

E||[/5(u;) - U{u;)\\hc < Cd^o+^. 



( 2 . 11 ) 
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(a) Ift-S- 

of 6 that 


( 2 . 12 ) 


2^0 < C < T — ^ito — t) then there is such C > 0 independent 

C-T + 3 (tf|-t) 

E\\Us{oj)-U{oj)\\hc <CS ^o+r . 


The different convergence speeds (i) and (ii) show the trade-off between 
the smoothness of the space and the speed of convergence. In case (i) we 
get better convergence rates but in case (ii) we can use a stronger norm. We 
see also that the smoother the forward operator A is the worse convergence 
rates we get. 


We note that instead of the estimates (2.11) or (2.12), we could alterna¬ 


tively take the expected value only with respect to the noise in which case 
the constant C would depend on the realisation of U{uj). That is, proof of 
Theorem 1 also shows that, we have almost surely 


(2.13) 


lim sup 
<5-s>0 


Eu\\Usiuj)-Uiu;)\\H, 


< oo 


where u = when C ~ s ~ 2to and v = — ^ when t — s — 

‘2'to < C < T — 3(to — t)- 


Remark 1. The MAP-estimate Ug takes values in the Cameron-Martin 
space of the prior. The Cameron-Martin space is the intersection of all 
linear subspaces where the random variable U belongs with probability one, 
and since there may be uncountable many such linear spaces, the Cameron- 
Martin space may be a zero measurable subset of the space where U takes 
values, see [3]. In the above settings where U ^ N{0, Cjj) and Cu is of order 
—2r the random variable U takes values in H'^, t = r — s where s > d/2, 
and the Cameron-Martin space containing the MAP-estimate is . In the 
Bayesian setting it is natural that the MAP-estimate can not converge in a 
smaller space than the one U takes values. However, the same behaviour 
can be seen also in the deterministic setting when the unknown is in and 
the MAP-estimate is thought as a Tikhonov regularised solution, see [23]. 

Example 2. Let us study a simple example in two dimensional torus T^. 
We consider a problem 

Ms = {I-A)-^U + 6£ 


where £ is normalised white Gaussian noise that takes values in iL“®(T^), 
s > 1 and (5 > 0 is the noise amplitude. The model operator A = (I — A)“^ 
is elliptic operator, smoothing of order 2. Let us consider the case when U 
has a priori distribution N{0,Cu) where Cu = (/ — A)“^, that is, r = 2. 
Then U takes values in LL^(T^), where r = r — s < 1, almost surely and 
Us G H^. Theorem guarantees us convergence rate C5 when Q < —3 and 


(2.12) when — 3 < C < 1. For example we get the following convergence in 
L^) 


E\\Us{u)-U{u:)\\L2ij2)<C6-^- 


with e > 0 arbitrarily small. 
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2.1.2. Convergence results in frequentists setting. In the frequentist case one 
is often interested in the model 


(2.14) 


Mj (w) = A{u^) + 6S{uj) 


where the data is generated by a ‘true’ solution G Above S 

is normalised white Gaussian noise and 5 G M+ is the noise level. In (2.14) 
all the randomness of the m| comes from the randomness of the noise S. 
We denote 


(2.15) ul{u;) = Ts{Ml{u;)). 

In the frequentist setting we consider the case where is an arbitrary ele¬ 
ment of the space H'^{N) where the random variable U takes values, instead 
of considering almost every element. Note that even though a set is measure- 
theoretically large, that is, has probability 1, it can sill be topologically small 
(meager, or a set of first Baire category). For a discussion on these issues 
see [T3] . 

We can then study the convergence or in more fre¬ 
quentist spirit where we use notation 


(2.16) K^iF{ul{u}),u\uj) = [ F{Ts{Au^+ 6£{u;)),u\oj)dF{u}), 

Jn 

that is, the expectation E„t is taken with respect to the noise £{u;) and the 
other terms depending on uj, and is considered as a fixed parameter, c.f. 

This means that after computing the estimator using Bayesian 
methods we will consider the convergence of the estimator I/j to the ‘true’ 
solution which is not thought to be a random draw from the prior any 
more. 



Remark 2. Next we consider the frequentist case when in addition it is 
assumed that G H'^{N) where r > 0. Note that then r > |. The mean 
integrated squared error (MISE) of an estimator U is defined 

(2.17) R{U,u^) = E^^\\U-u^\\l2. 

The minimax risk rs(F['^{N)) on the Sobolev space F['^{N) is then given by 

rs{H'^ {N)) = ini sup R{U,u^) 
u u^em{N) 

where the infimum is taken over all estimators of the form U = g{M^) where 
g G B{H~^, H'^). Here Ff'^) is the set of Borel measurable functions 

from to . 

Theorem 2. Let r > s > d/2 and N be a d-dimensional closed manifold. 
Let G H'^{N) where t = r — s > 0. Assume that Cu, the covariance 
operator of the Gaussian prior, is a self-adjoint, injective and elliptic pseu¬ 
dodifferential operator of order—2r. Let£{y,uj) he white Gaussian noise on 
N. Consider the measurement 

= A{u\-))+ 6£{y,uj), w G H, 

where A G t > max{0, —r -|- s} and t < to < t -\- r/3, is a 

hypoelliptic pseudodifferential operator on the manifold N and A : L?‘{N) —>■ 
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LP‘{N) is injective. Above S G M+ is the noise level and £ takes values in 
with some s > d/2. 

Then there is C > Q, independent of 6 and , such that 

, 2(T-3{to-t)) 

(2.18) E^tWU/iu) -u^\\l2 <Cil + \\u^fHr)6 ‘o+- . 


Note that the assumptions on Cu in Theorem imply that Cu is a co- 
variance operator of a random variable taking values in see Remark 

[ 3 ] below. 

In the elliptic case t = to we can write (|2.18 ) as 


(2.19) 


E^,\\ul{w) - < 0(1 + 


Since s = ^ -t- e the above convergence rate (2.19) agrees, up to e > 0 arbi¬ 
trarily small, with the minimax convergence rate, see [6]. The convergence 
of confidence regions is considered in Section 


3. Generalised random variables 

This section is largely based on the work of Lasanen 1311 [32]; see also 
Piiroinen m- 

For any s G M, let H^{N) be the L^-based Sobolev space equipped with 
Hilbert space inner product 

(3.1) [ {{I - - /\Y/^){x)dx. 

■JN 

We also dehne a dual pairing between H~^(N) and H^{N) 

(3.2) {(/,y)H-‘>{N)xH‘>{N)= (t>{x)y{x)dx 

Jn 

when € Cy^{N). Note that H^{N) = L‘^{N). We often denote = 
HYN) and 

A generalised Gaussian random variable V takes values in the space of 
generalised functions, and the pairing (R, 4>) with any test function cj) € D = 
C°°{N) is a Gaussian random variable taking values in W^, see |45j . The 
generalised Gaussian random variables we will consider below are assumed 
to take values in some Hilbert space, typically in a Sobolev space H^{N), 
where the smoothness index s G M may also be negative. Now, if V takes 
values in we say that V has the covariance operator By ■ if 

(3.3) E ((R - E R, </)hs (R - E R, i/>)^,.) = , 

with any ^ H^, see [l5|. We can also dehne covariance operator to be 
a mapping Cy : ^ 

(3.4) E ^(R — E R, (t))H»xH-‘ (R — E R, = {Cv4>, y)H‘xH-‘: 

with any see [3|. The connection between By and Cy is 

By = Cy{I - AY : ^ HT 

Next we will take a closer look to the generalised white Gaussian noise and 
introduce the ’white noise paradox’ by a simple example. 
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3.1. White noise paradox. White noise £ can be considered as a mea¬ 
surable map <5 : —7- 'D'(N) where is the probability space. Then nor¬ 

malised white noise is a random generalised function £(y,uj) on N for which 
the pairings {£,(j))v'xv are Gaussian random variables for all test functions 
4>£V = C°°In), E£ = 0, and 

(3.5) E ( {£, (l))v’xv{£, '0)d'xD ) = {I4>, i’)v'xv for (/>, -0 G P. 


We will denote this by if ~ N{0,I). A realisation of £ is the generalised 
function e = £{■, ujq) on N with a fixed wq G H. 

The probability density function of white noise £ is often formally written 
in the form 


However, the realisations of the white Gaussian noise are almost surely not 
in LP‘{N). This brings us back to the problem in formula (2.5). 


Example 3. Let £ be normalised white Gaussian noise defined on the d- 
dimensional flat torus T'^ = (M/( 27 rZ))‘^. Let G i= ■ ■ ■, ^d) 

G be an orthonormal basis of L?' (T"^) consisting of eigenfunctions of Lapla- 
cian, numbered so that —/S.e^= Such functions e^x) can be chosen 

to be normalised products of the sine and cosine functions s\n{ljXj) and 
cos{ijXj) that form the standard Fourier basis of L^(T'^). The Fourier coef- 
hcients of £ with respect to this basis are independent, normally distributed 
M-valued random variables with variance one, that is, {£, e^) ~ iV(0,l). 
Then 

nmhrn = E = E1 

This implies that realisations of £ are in L^(T'^) with probability zero. How¬ 
ever, when s > d/2 

(3.7) = Y,(.^ + \^\"r'n{£,ee)\^<oo 

fceZ'* 

and hence £ takes values in almost surely (that is, with probability 

one) 

On the other hand [HI Theorem 2] implies that if < oo almost 

surely then E||T||^_s^.j,rf^ < oo which yields s > d/2. This concludes that 

the realisations of white noise £ are almost surely in the space if 

and only if s > d/2. In particular for s < d/2 the function x £{x, to) is in 
only when w G Hq C H where P(Ho) = 0. 

3.2. The smoothness of the prior. Consider the continuum measure¬ 
ment model Ms = AU + £8 where the operator is now viewed as a smooth¬ 
ing map A : H'^{N) —)> FL^+*(A^) for all r G M. We construct the prior 
by choosing [/ to be a generalised Gaussian random variable taking values 
in H'^{N) and having expectation E17 = 0. First we will, however, give a 
definition for pseudodifferential operators. 
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Definition 1. Let m G M. We define the symbol class to consist 

of a{x,^) G such that for all multi-indices a and fi and any 

compact set K C there is such constant Ca,/3,K > 0 that 

Definition 2. Let Y :[/—)■ 6e local coordinates of the manifold N. A 
bounded linear operator A : V'{N) — V'{N) is called a pseudodifferential 
operator if for any local coordinates Y : U U C N, there is a symbol 

a G X M'^) such that for u G C^(U) we have 

^uiyi)= kA{yi,y2)u{y2)dVg{y2) 

Jn 

where kA\Nx N\diag{N) ^ {N X N\diag{N)) anddiag{N) = {{y,y) G Nx 
N\y G N}. Also when Y : U ^ V C are local C°°-smooth coordinates 
kA{yi,y 2 ) is given on U x Lf by 

kA{Y-\xi),Y-\x2))= [ 

where xi,X 2 G F C and a = ay G M'^). In this case we will write 

A G 

and say that in local coordinates Y : Lf ^ V C Mfi the operator A has the 
symbol a{x,^) G S™'{V x M'^). 

We assume that the covariance operator Cu G , that is Cjj is smooth¬ 
ing of order 2r, self-adjoint and elliptic. With given r G M we have to choose 
r G M so that Bu = Cu{I — A)’" G is a trace class operator. An 

operator Bu is in the trace class if 

OO 

(3.8) TrffT^H^ (Bu) = |Aj| < OO 

i=i 

where Xj are the eigenvalues of the operator Bjj. Condition Bu G 
guarantees that E(C/, < oo. 

Let Uj be the eigenvalues of Bfi^ G Counting the geometric mul¬ 

tiplicity of the eigenvalues, we arrange the eigenvalues of Bfi^ in ascending 
order as 


I'l Y 1/2 Y ■■■< Uf. <■■ ■ 

Since Bfi^ is a self-adjoint elliptic operator with smooth coefficients Weyl’s 
law for elliptic operators tells us that the number N(i/) = fffnj \ i/j < i/} 
of the eigenvalues of Bfi^ in a closed manifold less than or equal to i/ has 
asymptotics 

d 1 

N{i/) ~ ci/^G-r) -|- 0{i/ 2 (r-T) when u -G oo. 

Hence for the eigenvalues A of the operator Bu G 

Xj ^ c j 3 (l-|-o(l)) when j —)■ oo. 
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To satisfy condition (3.8) we require 


^ ^ ^ ^ 2(r — t) 

^\^j\ < ^ < oo 




1=1 


which gives us the condition r < — (d/2 — r). From here on we will assume 
that T = r — s<r — d/2. 


Remark 3. Any elliptic operator Cjj G that defines a non-negative 

symmetric operator Cjj : T>[N) = C°°{N) —)• T>'{N) has the property that 
Bjj = Cu{I — A)'^ is in {N)). By [3], these yield that Cu is a covari¬ 

ance operator of a random variable taking values in H'^{N). 


The operator Bjj corresponds formally to the smoothness prior 
'^priu) , =„ cexp ( - hB/}\,u)H-] = cexp ( - h\C//^/‘^u\\l 2 

formally \ Z J \ Z 

Notice that the realisations of U are almost surely not r times differentiable. 
In a case r < d/2 the realisations of U are almost surely not even in let 
alone differentiable. This is why we need to consider U as taking values in 
some space H'^{N) with possible negative smoothness index r. 


4. Proof of the main Theorems 

Before we move to prove theorem we will give a short introduction to 
hypoelliptic pseudodifferential operators. 

Definition 3. Let t, to £ We define symbol class to 

consist of a{x,fi) G for which 

(1) For an arbitrary compact set AT C we can find such positive 
constants R, ci and C 2 that 

ci(l + < |a(x,OI < C 2 (l + lel >R, xeK. 

(2) For any compact set AT C there exist constants R and Ca,i3,K 
such that for all multi-indices a and fi 

|9|‘afa(x,0l < + 1^1)-'“', lei >R, xGK. 

We will denote by the class of "if DO with local symbol a{x,^) G 

X M'^), see Definition^ 

We denote FL^^N) = Ff^ and L‘^{N) = where A is a closed manifold 
and dim N = d. 

The proof of Theorem is rather long and technical so we will start by 
going through the main steps of it in a nutshell. The approximated solution 
we are studying is of the form 

Ts{ms) := argmin {IIAullla^jv-) -Au) + 6^\\Cfi^^^u\\l2(N)}■ 

u&H^{N) 

As mentioned before the solution to this is 

(4.1) Tfims) = {A* A + S^C//Y^A*m 5 . 
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We can rewrite the above as 

Tsims) = Zj^A*Au + ZJ^A*{£5) 

= u- + Zl^A*{e6) 


(4.2) 


-1 

u 


where Zs = A* A + 5‘^C^ 

To study the convergence of the last term on the right hand side of (4.2) 
we would like to write it in the form 


(4.3) 


Zg^A*{£d) = 6-^F^^{A*A)-^ 


A*£ 


where Fs = {A*A)~^C^^ + <5“^. In order to show that (4.3) is well-defined 
we hrst need to prove that A*A and Fs are invertible. 

Lastly we study the converge of 6‘^Zj^C^^u and 5~^F^^{A*A)~^A*e to 
zero in appropriate Sobolev spaces and show that the latter term is always 
dominating. 

We will start by showing that A* A G jg invertible and {A*A)~^ G 

Define A* : L'^{N) L‘^{N) as the adjoint of an operator 

A : ^ L^. We assumed in Theorem [T] that ^ —)• is one-to- 

one. Since A*A : W —)• r G M, is hypoelliptic [M], Propositions 5.2 

and 5.3] A* Au = 0 G C°° implies that u G C°^ and hence Au G L^. Now we 
see that if A* Au = 0 then 

0 = {A*Au,u)i 2 = {Au, Au)i 2 = ||Au |||2 

which implies Au = 0 and furthermore u = 0. Thus the operator A*A : 
H^{N) iL^+2*(lV) is one-to-one. 

To study the mapping ^4*^4 G LIfrom some Sobolev space 
r G M, we define = A* A : —)■ The adjoint of Lr is denoted by 

L; = {A*A)' : il-h+2b ^ il-L Let G C°°. Then 

A4>,'lp) fjr+2ty;H-ir+2t) = {(j), A* A'l/^) fjr fj-r , 

that is, L), = and hence the adjoint is one-to-one. Now we can 

conclude that Lr{H^) C iI'’+2* ig a dense subset. Next we will prove two 
lemmas that show that the operator A*A:F’^F’ is also surjective. 

Since A*A G LIxk~2i,-2to jg hypoelliptic pseudodifferential operator it 
has a parametrix Bi G |49[ Theorem 5.1]. Hence for any ro > 0 we 

get norm estimates 


(4.4) 


PMnll fjr+2t < 

< (72 11 j j jyr+2to -|- (73jjMjjj^r-ro 


for all u G (7“. Next we will show that (73 is zero. 

Lemma 1. Let A*A G iI4/“2i,-2to injective hypoelliptic pseudodiffer¬ 

ential operator. Then we have the following estimates 

Ci\\A*Au\\ }Jr+2t < ||u||//r < (72ll^*Aujj ^r+2tQ 

Proof. We get the first inequality since A*A is continuous linear operator. 
If the second estimate in (4.4) is not valid with (73 = 0 for any C 2 > 0 
then we can choose a sequence uj such that llujUi^r-ro = 1 and IJujjjHr > 
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j\\A*Auj\\jjr+ 2 tQ. When j > 2 C 2 then 

C2\\A*Auj\\ }jr + 2tQ + C2,\\Uj\\^T-TQ 
— 2 ’’ 0 - 

This gives us 

ll'WjIliT’- < 2C3||nj||j^r-rQ = 2C^. 

Since tq > 0, the embedding H'' ^ fji'-ro jg compact. Now there exists a 
subsequence and such a tc G that iim£_>.oo = in in We 

assumed that ||nj^||r-ro = 1 which impiies ||i(;||,._ro = 1- On the other hand 

||r+2to — IIIlf — 

that is, iim^^oo l|r-+2io = 0 and because —tq + 2t < 2to 

iim \\A*Auj^\\r-ro+ 2 t = 0. 

£—>■00 

Since Uj^ —)■ ic in ^jgo have A*Auj^ —)• A*Aw in _ Com¬ 

bining the above resuits we see that \\A*Aw\\jjr-rQ+ 2 t = 0. Operator A*A is 
one to one and hence in = 0. This is a contradiction since ||iF||//r-ro =1. □ 

Lemma 2. Let A*A : H'' —)• be an injective hypoelliptic operator. 

Then the image of in the map A* A satisfies 

jjr+2to ^ a*A{H^) C 

Proof. The second inciusion is a direct consequence of the mapping prop¬ 
erties of A* A. Let / G Since C°° C tg dense subset we 

can find such a sequence fj G C°° that iimj_,.oo fj = / in ^^^+2*0 _ Since 
A*A{H'^) C L/''’+2* is dense we can aiso choose a sequence hj^i = A*Agj^i G 
A*A{H^) such that iimf_,.oo hj^e = fj in f/"'’+2io_ Denote gj = gj/^ G H'" for 
which iimj_>.oo A* Agj = / in ^^^+2*0 Using Lemma we see 

iim \\gj - gkWu- < C 2 hm \\A*Agj - A*AghUnr+^to = 0. 

j,k^oo j,k^oo 

Hence aiso gj G is a Cauchy sequence. Thus there exists such g G H'^ 
that iimj_,.oo Qj = 5 in H''. On the other hand, 

iim \\A*Agj - A*Ag\\Hr+ 2 t < Ci iim \\gj - gWnr = 0. 

J^oc k —^00 

Combining the above we get A* Ag = f. □ 

Using Lemma we see that A*A{V') = V\ that is the operator A* A 
is aiso onto. Now we can conciude that there exists an inverse operator 
{A*A)~^ ■. T>' ^ V . It remains to show that the inverse operator is a 
hypoeiiiptic pseudodifferentiai operator. 

Lemma 3. A self-adjoint, smoothing, one-to-one hypoelliptic operator A* A G 
LI'I'-2h-2io flag inverse operator {A*A)~'^ G 

Proof. Denote B = {A*A)~^ : V —)> V . For an operator A*A ; —)• iL’'+2* 

we define Bq <Z B with domain 

V{Bo) = {/ G 


Bf G = A*A{H^). 
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Using the hypoellipticity of A*A we see that A*Au = / G implies 
u G C^. This gives us Bq : — )■ (7°°. Since is a Frechet space and 

A* A is continuous and linear A* A : C°° —>■ C°° is an open mapping [461 
Theorem 2.11]. Hence the operator Bq : C°° C°° is continuous. 

Since A*A is hypoelliptic it has a parametrix Bi G |49[ Theorem 

5.1] 

f Bi{A*A) = I + Ki, iFi G 
\ {A*A)Bi = I + K2, 

and we can write 

Bo = Bo{{A*A)Bi - K 2 ) = Bi- B 0 K 2 . 

The operator B 0 K 2 ■ T)' —)> C°° is continuous and thus we have shown that 

Bo = Bi modT-~. 

That is Bo G □ 

Next we will examine TDOs that depend on spectral variable A = 

For the general theory see |49j . 

Definition 4. The symbol class x consist of the functions 

a{x, A) such that 

(1) a{x,^, Ao) G X for every fixed Aq > 0 and 

(2) for arbitrary multi-indices a and /3 and any compact set K G 
there exists a constant Ca,i3,K such that 

A)1 < C„,^,^(l + 1^1 + 1A1V?>)—H 

for X e K, and A > 0. 

ITe denote by 4'™(N', M+) the class of pseudodifferential operators Ax for 
which the local symbol a{x,^,X) G Sff{V x see Definition^ 

Definition 5. If there are constants Ci,C 2 ,R > 0 such that the symbol 
a(x,e, A) G X satisfies 

CidCl + 1 A 1 Vp)-o < \a{x,f,X)\ < C 2 (ie| + |Al'/^>)"^, 

for + JA] > R, we say that a is hypoelliptic with parameter X and denote 
a{x,C,X) G X 1^^^,^+). fUe will denote by {N,R+) the 

class of'ItDOs depending on the parameter X whose local symbol belongs to 
X see Definition^ 

Next we will prove that 

Fx = {A*A)-^Cff^ + X 

is invertible. Operator G is hypoelliptic since {A* A)~^Cfj^ 

^ij/ 2 (io+r-), 2 (t+r)^jY^ jg hypoelliptic and A / G Denote Q = {A*A)~^Cfj^ 

and its symbol q(x, G Then for the symbol u(F a)( x, = 

C) + A of the operator Fx 

(g(x,0 + A)1 < + ICI + lAll/(2ho+r-)))2(to+r)-H, 


POSTERIOR CONSISTENCY AND CONVERGENCE RATES FOR BAYESIAN INVERSION 17 


By 


Theorem 9.2.] there exist R > 0 such that for |A| G [i?, oo) the 


operator Fx G 


‘i{to+r),2{t+r) I 


is invertible with 


2{to+r) 




(4.5) 


2(to+r) 

We have now shown that the operator Zg can be written 

-1 

-1 


Z7^ = X({A*A)-^C^^ + XI] {A* A) 




where X = S Hence we can rewrite (4.2) 

(4.6) Txim) = u- X-^Z^^C^^u + ^/XF-^{A*A)-^A*e. 

Now we will proceed to study the convergence of the second and third 
term on the right hand side of (4.6). For the third term of (4.6) we have 

_ y jj^k-\-2{t-\-r) 




Next 


{A*A)-^A* : H-^ ^ k = -s + t - 2to and F^ 

Hence when (^ < k + 2{t + r) we have 

\\F-\A*A)-^A*e\\HC < \\F-%J{A*A)-U*e\\h^ 

where is the norm of F^^ : F[^{N) —)• F[‘^{N) and A:,C G 

we want to study what happens to the norm when A —?• oo. 

We have the following norm estimates for F^^ G ^^{N, M_|_) when £ > m 
and A large enough [491 Theorem 9.1.] 

(4.7) \\F-^U,k-e<C\iil + \X\FPr, a £>0 

(4.8) \\F-%,k-i<CkA^ + \X\FPr^^--\ if ^<0. 


In our case F^^ G 'h™ where m = —2(t+ r) and p = 2(to + r). We will write 


ll-^A llfc,C = ll-^A IIwhere £ =/c - C > m. 

First we study the case when £ > 0 that is (< k. Inequality (4.7) gives 
us the norm estimate 


||i"A“'lUA-^<C^(l + |A|'/^)”^. 

Because we want \/A||F]("^(A*^)“^74*e||^c to converge when A —)> oo we 
have to require that 

m —2{t + r) 1 
p 2{to + r) ^ 2' 


This is true when to < 2t + r. 


When ^ < 0 we have k < C, < k + 2{t + r) and can use (4.8) 


p-ii 


,,k-i<C{l + \X\FP)-^^- 


— (£—m) 


For convergence we need 

m — £ _ —2{t + r) — k + ( 1 

p 2{to + r) ^2 

that is/c<('<A; + 2t + r — to = ^~s — 3(to ~ t) which can be true only if 
to < 2t + r. 

Next we will prove the convergence of the term 6‘^Z^^C^^u in Since 
we got above that C < t — 3{to — t) we can write Q = t — 6 where 6 > 
3(to — t) > 0. We need to find such r] > 0 and 7 > 0 that 7 + 7 = 1 and 
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to 7 — rrj + r — 0/2 = 0. Define 7 = 0/2(fo + r) and 7 = 1 — 9/2{tQ + r). Now 
7 > 0 only if 0 < 2(fo + r). Hence we will choose 0 = niin{T — 1 /, 2(to + r)}. 
Since Z 5 = A*A + where A*A > ci{I — and 02(1 — A)’’ < 

<C3{I - AY we get 

<5'||(AM)-^(,52Cf/')-^C3(I-Ar+§u||i2 
(4.9) < 6Y\{ci{I - A)-^Y~^{c2S\I - AY)-^{I - A)’'-f+ii/||i2 

e 

= (7(5* 0 +’' ||ii||i^T 


where 0 = min{r — Y 2(fo + i')}- 

Adding the above results together we can prove Theorem 

Proof of Theorem [I[ To get the speed of convergence we use the fact that 
U and £ are independent. Similarly to (4.6) we get 


(4.10) Usiio) - Uiio) = -YZY^CY^U{oj) + ^T-i(A*A)-M*T(a;), 
where by ( |4.9| ), 

(4.11) \\6^ZY^CY^U\\h, < C-,54^ \\U\\h^. 


For the second part on the right hand side of (4.10) we can write 

(4.12) E|| {A*A)-^A*£{io)r^, < || (A*A)-iA*E||T(a;)||^_., 

with p G {1, 2} and 0 = min{r — (/, 2{to + r)}. 

When C < t — s — 2to we get 


E\\Us{co)-U{uj)\\hc < m\ZY^CY^U{uj)\\HC+S-^E\\FYYA*A)-^A*£{uj)\\^ 


-Iti 


2-1/ 


\ —1 A* I 


< Ci,5‘o+/-E||C/(a;)||H-+^2(5 ^o+r E\\£\\^_, 

{ e 2 t+T'-fn N 

JtO+r^J to+r I 


2(f+r) 


where 0 = min{r — C) 2(to + ^)}- Next we will study which of the terms is 
dominating. The noise term 5~^FY^{A*A)~^A*£{ijj) is dominating if 

2t + r — to <6. 

Assume first that 6 = t — C,. Then 

0 = r — C > 2to + r — t > 2t + r — to¬ 
ff 0 = 2(to + r) we get 

0 = 2(to + r) > 2to + to — 2t + r > 2t + r — to 

since t < to < 2t + r. Hence the noise term is dominating in both cases and 
we have proven 

2 t —tQ + r 

E || C / 5 ( a ;) - U{lo)\\hc < C5^o^. 
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If t — S — 2to < C < T 
E\\Us{io)-U{uj)\\HC < 
< 
< 


— 3(io — t) we get 

r-C 2 (t+r)+e 

Ci6^o+rE\\U{u;)\\Hr +C 2 S *o+- E||^(a;)||^^-« 

Cmax < (5‘o+’', (5 * 0 +’’ 


Above £ = t—s — 2to — C,. Note that when ^ > t—s — 2to then t — C, < 2(r+to)- 
The noise term is dominating if 

2t + r — to + £ < T — 

This is always true since t = r — s and 

2t + r — to + £ = r — s — ( — 3(to — t) < t — 

Hence we can conclude 

2t+r-to+^ 

E\\Us{oj)-U{oj)\\hc <CS ‘o+- . 

□ 


Proof of Theorem^ Similarly to (4.10) we get 

(4.13) Uliu;) - nt = + ^F-\{A*A)-^A*S{uj), 

where the first term on the right side satisfies, by ( |4.9[ ), 

(4.14) \\5^Zl^Cf}^u^\\HC<C6^^\\u^\\H^ 

with 9 = min{r — 2{to + r)}. The expectation of the second term in the 
right side is estimated in (4.12). Analysing the obtained terms as in the 
proof of Theorem we obtain 

^ 2(T-3(tf|-t)) 

EWUjiu) - < C{1 + \W\\Hr)5 . 

□ 


5. Posterior distribution and confidence regions 

One advantage Bayesian inversion offers over deterministic regularization 
is uncertainty quantification. Since the solution to the Bayesian inverse 
problem is the posterior distribution of the unknown we can study its cred¬ 
ible sets and their contraction in some Sobolev space when 5 —)• 0. A 
Bayesian credible set is a region in the posterior distribution that contains a 
large fraction of the posterior mass, for instance, 95%. We are dealing with 
Gaussian distributions so we define the credible sets to be central regions. 
This means these sets are defined as central balls with us as a centre. 

The above mentioned credible sets are often used to visualise the remain¬ 
ing Bayesian uncertainty in the estimate. Frequentists use another kind of 
uncertainty quantification called confidence region. A confidence region is 
a range of values that frequently includes the unknown of interest if the ex¬ 
periment is repeated. We can define confidence regions as central balls with 
u\ as the centre. Here is the frequentist approximated solution generated 
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by a true solution . How frequently the ball around the approximated 
solution, with different realisation of the noise, contains the true solution is 
determined by the confidence level. See for example [T5l 1^ . 

In the finite-dimensional parametric case and under mild conditions on 
the prior Bernstein-von Mises theorem provides that the credible sets of 
smooth models are asymptotically equivalent with the frequentist confidence 
regions based on the maximum likelihood estimator, see [56]. In infinite¬ 
dimensional case there is no corresponding theorem and Bayesian credible 
sets are not automatically frequentist confidence sets. This means that 
if we assume that the data is generated by a ‘true parameter’, it is not 
automatically true that credible sets contain that truth with probability 
at least the credible level. However the correspondence of Bayesian and 
frequentist uncertainty has been studied in many recent papers see e.g. jH 
El EH E3 03 E2|- These results are important since they show that some 
credible sets can give a good idea of the uncertainty of the estimate in 
the classical sense. In this section we show that the posterior distribution 
converges and we give some convergence rates. We also prove that in the 
elliptic case the frequentist posterior contractions rates agrees, up to e > 0 
arbitrarily small, with the minimax convergence rate. We do not address 
the question about the frequentist coverage of the credible sets. 

We will start by studying the convergence of the posterior covariance 
Cs which, with the convergence of the posterior mean Us, guarantees the 
convergence of the posterior distribution. 

When U ~ A^(0, Cu), £ ~ iV(0, 1) and 

(5.1) Ms = AU + 5£ 


the conditional probability distribution of U with respect to the measure¬ 
ment Ms is a Gaussian measure with mean Us and covariance |36l [37] 

(5.2) Cs = Cu- CuA*{ACuA* + 6^I)-^ACu. 

If A* : T>'{N) —)• V {N) is invertible we can rewrite the above 

Cs = 5‘^[A*A + 5^C^^y^ 


(5.3) 


= {{A*A)-^Cy + < 5 - 2 /^ 


{A*A)-\ 


Note that the covariance operator is deterministic and thus independent of 

Ms. 

We define Fx = {A*A)~^Cy + XI, where A = 6~^, as in section [4] Then 

p-i 


F-^GT-(iV, [ii,oo)) 


where m = —2{t -|- r) and p = 2{to + r). Using the norm estimate (4.8) we 
get 

\yzy\\_r,r = \\{{A* A)-^cy + 5-^l)~^\U-2tA\{A* A)-^\\ 


— IU,fc-£ 

< c(1 + Ap)-(^-”^). 


—r,—T—2t 
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Above k = — r — 2t and i = —2(r + t) <0. Since t = r — s we can write 

i — m —T + r 

P to + r 

s 

to + r 

and hence we get the following convergence rate for the posterior covariance 
(5.4) l|C'5||_r,r < 

d 

< c(5‘o+’'. 

We see that the more smoothing the forward operator A is the worse 
convergence we get. Note that r and s do not only affect the convergence 
speed but also the spaces between which the norm is taken. 


Remark 4. Observe that the random variable U takes values in H'^ and the 
estimate (5.4) concerns the mapping properties of the posterior covariance 
operator from the dual space H~'^ = to the space . For strictly 

positive (5 > 0 the MAP estimator Us belongs to the space r = t + s > t, 
but as 5 —)• 0, the MAP estimators Us converge in a less regular space 
C < T — 3(to — t) < T, see (2.10). 


5.1. Contraction of the posterior distribution. Next we consider the 
inverse problem using the frequentist setting described in Subsection 2.1.2| 
with the additional assumption that r > 0. We assume below that Cu 
satisfies the assumptions in Theorem that in particular imply that Cjj is 
the covariance operator of a random variable U taking values in see 

Remark]^ We recall that we consider a fixed ‘true’ solution G H'^{N) 
and the noise model Mj(u;) = Au^ + 5£{uj) as in (2.14). Also, note that the 
MAP-estimate is then f/j = Ts{mI). 

In the frequentist case one is often interested in the the limiting behaviour 
of the posterior measure when 5 —)• 0. Here, P^t is a random measure in 


depending on 6 and the MAP estimator Pj = Pj(a;) = Ts{mI{u})) 
(that further depends on the deterministic variable and the realisation 
£{uj) of the random noise). Let Ws be a Gaussian random variable, taking 
values in that is independent of the noise £, has zero mean and 

the covariance operator Cs, see (3.4). For a measurable set B C H'^{N) we 
define 

(5.5) -fVt(-®) = e \ws = b- P](w) with b E p|^ 


where fis = ^i0,Cs)- Roughly speaking, P^t is a Gaussian measure in 

H'^{N) with the mean uj and the covariance operator Cs- 

Recall that we consider the probability space (H, S,P) and denote by xs 
the indicator function of S. Let S' E S. We use the notations 


(5.6) iF{uj, u^)) := E(P(a;, nt)|P), (S) := P(S|P) = E{xs\F), 

for the conditional expectation and conditional probability. Above P C S is 
the cj-algebra generated by random variable Mg (a;) or equivalently, the noise 
£{uj). Roughly speaking, in the notation E„tP the subindex reminds that 
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v) is a fixed parameter and the expectation is taken only with respect the 
noise. The notation indicates that the measure of B is computed 

using the posterior probability measure which mean uj = Ts{Mj) depends 
on the measurement m|. Since the random variable Ws has distribution /i^, 
we have by (5.5) 


(5.7) P^,iB) = 


lm{N) 


Xb{w5 + ul)dns{ws) = Fj^^{{Ws + ul G B}). 


Following the approach in dnniEziEZ! we next show that the posterior 
measure contracts to a Dirac measure centred on the fixed true solution . 

Theorem 3. Let r > s > d/2 and N be a d-dimensional closed manifold. 
Let G H'^{N) where r = r — s > 0. Assume that Cu, the covariance 
operator of the Gaussian prior, is a self-adjoint, injeetive and elliptic pseu¬ 
dodifferential operator of order—2r. LetS{y,uj) be white Gaussian noise on 
N. Gonsider the measurement 

= A{u\-))+ 6S{y,uj), w G D, 

where A G t > max{0, —r + s} and t < to < t + r/3, is a 

hypoelliptic pseudodifferential operator on the manifold N and A : L?‘{N) —>■ 
Lf{N) is injective. We assume also that A* : P'{N) —)> 'D'{N) is invertible. 
Above 6 G M+ is the noise level and E takes val ues in H~^(N) with some 
s > d/2. Let ul be the MAP estimated given by (2.15). 

Let K < kq = , Co > 0, and R > 0. Then there is ci > 0 such 

that 

(5.8) 

sup e dL^iN) I \\u - 0, 

as (5 —>■ 0. 

Proof. Let G H'^{N) and Ws be the Gaussian variable dehned above. 
Using the Markov inequality and (5.7), we get 

|n G H^(N) 


P 


Ml 


|tt — U'*'||i2 > 


co5"} 


= 


(5.9) 


< 


({llITj + f/l-u'llL^W SCO'S"}) 


Since Ws and t/j are independent and Ws has the covariance operator Cs, 
we obtain using notations (5.6) 


E,tE t(||lU, + C/]-ut||2,(^^) ^ E,tE t||fU5||i2+E,tE t||t/l-ut||2 


L2 


(5.10) 


— ^^L2(Ar)-s.L2(A)(C'5) IIl2(^)• 

We have shown in Theorem that the second term on the right side of 

(5.10) can be estimated by C 2 (l + with some C 2 > 0. Hence it is 

enough to show that 


TrL2^L<Cs) < 
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with Kq 


_ 2{t ^3^0 t)) ^ estimate the trace by writing 

TrL2^L^{Cs) = TrL2^L2{{I - A)-*(/ - AyCs) 

< TrL2^L^ ((/ - A)-*) ||(/ - AyCs\\L2^L^ 


Above (/ — A) ® is trace class operator in since 

OO 

< OO when s > d/2. 
i=i 

As before we get 

||(I-ArC5||o,0= ||(/-Ar||2.,o||-FA"1-2i,2.||(^M)-l||o,-2I 

^ c||F^ ^||_2t,-2t-r 
< c(l + Ar)-(^-”^). 

Above i = —2(s + t) < 0. We can write 

i — m T 
P to + r 


and hence 

TrL2^L^{C5) < c6^ < 


□ 


Note that in the elliptic case to = t we get contraction 

G H'^{N) I \\u - tt^||L 2 (jv) > —)■ 0 

when (5 —)■ 0 for all cq > 0 and k < Since s = | + e the above 

convergence rate agrees, up to e > 0 arbitrarily small, with the minimax 
convergence rate. 

Remark 5. Above we have assumed that is in H'^. This correspond 
to the fact that the random variable ?7, having the covariance operator 
Cjj G takes values in H'^. The L?‘{N) norm in the contraction formula 

) can be considered as a loss function on H'^{N). Note that the loss 
function d{vi,V 2 ) = ||ui — V 2 \\l ‘2 defines a distance function in the 
but the obtained metric space is not complete. When the direct map A is 
the identity map, similar estimates with different loss functions have been 
studied in a general setting in [18]. However, from the point of view of 
inverse problems m corresponds to the case when the direct operator and 
the covariance operator of the prior commute. This differs from the problem 
analysed in our paper, where covariance operator Cjj and the operator A 
may not commute, and are of quite different type in the sense that Cjj is an 
elliptic operator but A is hypoelliptic operator. The phenomenon that the 
solution is assumed to be in a smoother space, in our case in , and the 
convergence of the posterior distribution is analysed using a loss function 
given by a less strict norm, in our case L^-norm, appears in many frequentist 
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studies, see e.g. Theorems 2.2 and 2.3 and Remark 3.6 in [T]. Conditions 
similar to the smoothness requirement G are also encountered in 
classical regularisation theory [9] where this type of conditions are called 
source conditions. 

5.2. Convergence of the posterior distribution in Bayesian settings. 

Next we will proceed to study the contraction of the posterior distribution 
using Bayesian techniques, the measurement model = AU + 6£ and the 
MAP estimator Us = Ts{Ms). Let us write 

Vs = Us + 

where Ws ~ A^(0, Cs) is a Gaussian variable having the covariance operator 
Cs given in ( |5.3[ ) and the zero mean. Random variables Ws and Us are 
assumed to be independent. Let Ms = o'(Ms} be the cr-algebra generated 
by the random variable Ms. Then the distribution of the random variable Vs 
is the same as the posterior distribution of U with respect to the cr-algebra 
Ms- 

Let us be the posterior distribution of U with respect to the cr-algebra 
Ms- Equivalently us is the distribution of Vs in the Sobolev space 
where Ci ^ Let fj,s be the distribution of the random variable Ws which 
is independent of Us = Ts{Ms). Then the conditional expectation of the 
indicator function with respect to Ms is 

^ BcAUs,R{S)),R{6))}\Ms){u;) 
(5.11) =¥{{WseB^^{0,R{6))}){oo) 

= fis{BcAO,Rm{^)- 

Above B(^^{0, R{S)) denotes a ball in of radius R{6). 

Let U be the cr-algebra generated by U. By [8l Theorem 10.2.2] there 
are regular conditional probabilities R{K \ Ms){oo) for all K G and 

cj G n such that 


P{K\Ms){io)=E{xK{U)\Ms){io) a.s. 

Moreover, by applying 0 Theorem 10.2.1] to the joint distribution of {U, Ms) 
we see that there are such functions 

(m, K) ^ Fm{K) =: F{{U e K} \ Ms = m), 

defined for m G H~^ and K G that 


(5.12) 


^MsU){K) = P[K\Ms){u:) 


Using (5.11) and (5.12) we see that 


a.s. 


f({U ^B(^^{Ts{m),R{5))]\Ms = m) = iis[B^MR{5))). 

Note that the right hand side is in fact independent of m and depends only 
on 5. Next we will give a theorem for the credible sets 


P({U5 G B^,{Us,R{5))]) = P({C/ G B^,{Ts{ms),R{5))] \ Ms = ms). 
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Theorem 4. Let U, 8, Ms and A be defined as in Theorem\^ and assume 
that A* : DfiN) —>■ V^N) is invertible. 

Let Ais = cr{Ms) and U = aifiJ) be the a-algebras generated by the ran¬ 
dom variable Ms and U respectively. Then the posterior distribution of the 
random variable U with respect to the a-algebra Ais can be given in terms 
of function 


(m, K) ^ ¥{{U eK}\Ms = m), 

where m G H~^{N) and K G cf. (5.12). 

Take Ci < r + t — to CLnd a < 7 / 2 . Then if R{6) = 
following contraction: 


Ci(5“ we have the 


P({C/ G B^fiTs{ms),R{5))} \Ms = ms)>l- ^ 1 


when (5 —>• 0. The speed of contraction depends on Ci- 
(i) If Cl < -s - to then 7 = 2(t + r)/(to + r). 

(a) If -s - to < Cl < T 1 - to then 7 = 2(t + t - to - Ci)/(^o + r). 


Proof. We use below R = R{5) = Ci5^ with some a > 0 and denote 

ps = l- fis{B(^fi0,R{6))). 

To study what happens to ps we first notice that 

= [ \\w\\l,,dfrsiw) 

Jh<i 

> / \\w\\]^cidTs{w) 

( 5 . 13 ) J\H\l^^>iRiS)T 

> R{5)‘^ [ dp.s{w) 

= {R{5)fp6. 

Next we will prove that 

HWWsWl,,) =TrHC,^HC.Bws < 

with some 7 that depends on ("i. Above we use the definition Bwg = Cws {I— 
where Cwg • II~‘^^ —)• and 

{Cws<t>, '0)iTCi xH-fi = <())j^Ci xiT-Ci (^<55 V’)iTCi xJT-Ci ) • 

As noted before when A* is invertible we can write 

( 5 . 14 ) Cw,=Ffi\A*A)-^ 

where F\ = {A*A)~^Cfi^ + \I and A = 5“^. 

We want to estimate 

^\\Ws\\lc,) = H\\{I - ^f^'^Ws\\l.). 

Let us define 

= {I- A)^i/2pp^ L‘^{N). 

We can write the covariance operator of 

= (^ - A)^i/2Cm/,(/ - A)^i/2 : ^ Lfi 



26 


HANNE KEKKONEN, MATTI LASSAS AND SAMULI SILTANEN 


Note that in we have Bwg = Cwg • Now we get for the trace 

= TTL2^L<{I-^f^Cwg) 

= TrL2^L^{{I - A)-%I - A)i^+^Cwg) 
< TrL2^L^{iI - A)-^)\\{I - A)<^+^Cws 


Using (5.14) we get 

\\{I-A)^^+^Cws\\l^^l^ = ||(/-A)fi+*F-i(^M)-i||o,o 

< ||(.f — A)‘’1’''*||2 (^j+s)_oI|-^a ^ll-2to,2(Cl+s)ll(^*^) ^llo, 

< C\\F~^\\_2to,2iCl+s)- 


Above = {{A*A) + A) ^ G 4'™ where m = —2{t + r) and p = 

2(to + ?’)• We want to use the norm estimates (4.7) and (4.8) so we write 

^’-ii 


l-^A ll-2to,2(Ci+s) - 11^' 


A ^l|-2to,-2to-^ 


where ^ = —2[s + Qi + to). To use the norm estimates we need to assume 
i> m, that is, Ci < t — s + t — to = "^ + ^ ~ ^o- 

First we assume that ^ > 0 which is true when ^ < —s — to. Then 

\\F-^\\.2to,-2to-e < C(l + ATO)-2h+U 

and ||F^^||_2to,-2to-^ 0 when A —)• oo with all t,tg > 0 and r > 0. 

Next we assume £ < 0. Then for —s — to < Ci A t + t — to we get 

\\F-^\\-2to,-2to-e < C(l + A5(U+U)2(^+Cl+io)-2(i+r)^ 

Now \\F^^\\_2to-2to-£ —^ 0 when A —> oo if Cl < r + t - to- 
We have proven that 

E{\\Ws\\l,,)<C5'^ 


where 7 
if —s — 
see that 


= 2(t + r)/(to + r) if Ci < -s-to and 7 = 2(r +1 - tp - Ci )/(to + r) 
to<C<'^ + ^“^o- Hence using the above estimate and (5.13) we 


C5'r 

- (Ci(5“)2 


^^ 7 - 2 a_ 


Above we have to assume a < 7/2 to have convergence —)• 0 when J —)• 0. 

Finally, since we have denoted us = Ts{ms), we can conclude that with 
above choices for R{5), 7 and a 

p({U 5 G B^,{Us,R{6))}^ =p({t/ G B^,{Ts{ms),R{6))}\Ms = ms) 

> 1 - ^ 1 


when (5 —)• 0 . 


□ 


L2^.L2. 


—2tn 
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5.3. Discussion. Above, we have considered in the frequentist setting the 
case when the solution is an element of H'^{N) with r > 0 and stud¬ 
ied in Theorems 2 and 3 the convergence of the MAP estimators and the 
contraction of the posterior distribution in Lp‘{N). 

In the Bayesian setting we have examined the case when the solution 
is a realisation of the random variable U. In Theorems 1 and 4 we have 
studied the the convergence of the MAP estimators and the contraction of 
the posterior distribution in H^{N) with various values of Q. 

In classical regularisation theory for linear inverse problems, one is usually 
interested in the convergence of the optimisers of the minimisation problem 
(2.5) to the true solution in the space r > r-|-(i/2, as the noise level 

5 goes to zero. This gives restrictions to the measurement noise that can be 
considered. Summarising, our above statistical considerations concern the 
case where unknown and the noise are significantly less smooth than in the 
standard setting of the regularisation theory. In the recent regularisation 
theory inverse problems where the direct map A and the regularisation term 
are non-linear have been studied extensively. It is interesting to ask how our 
analysis on the contraction of the posterior distribution could be generalised 
for such non-linear inverse problems that corresponds to non-Gaussian sta¬ 
tistical problems. 


Appendix A. Some examples of hypoelliptic operators 


A linear partial differential operator P is hypoelliptic if for every distri¬ 
bution u such that P{u) is (7°° smooth also u is C°°. Every elliptic operator 
with smooths coefficients is hypoelliptic. The heat operator 

Pu{x, t) = dtu — kAxU, (x, t) G X M 
and Kolmogorov operator [301ES] 

(A.l) Pu{x,y,t) = dxxU + xdyU — dtu, {x,y,t) 


are examples of operators that are hypoelliptic but not elliptic. General 
Kolmogorov type hypoelliptic diffusion operators are used e.g. in the theory 
of kinetic equations, statistical physics and mathematical finance [I71139]. 

The fact that (A.l) is hypoelliptic follows from Hormander’s theorem on 
hypoelliptic PDEs. Let {Xq,Xi, ... ,Xp) be real C°° vector fields in the d 
dimensional manifold N. If X and Y are two vector fields we dehne the 
bracket of X and Y by 


[X,Y]f = X{Yf)-Y(Xf). 


Note that [A, K] is a new vector field. 


Definition 6 (Hormander Gondition). We say that the Hormander condi¬ 
tion is satisfied if the real C°° vector fields {Xq, Xi,, Xp), p < n, in the 
manifold N generate a Lie algebra of rank n = dim A at every point x € N. 


This means that the vector fields 


Xy,[Xj„Xj,],[Xy„[Xy„Xyfi],... 

span a space that has the same dimension n as the manifold N at every 
point X £ N. Now we can formulate Hormander’s classical theorem |2n| . 
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Theorem 5 (Hormander’s theorem). The operator 

i=i 

defined on n dimensional manifold N is hypoelliptic if the veetor fields 
(Xq, Xi,..., Xp) satisfy the Hormander condition. 


By writing 


Xi = 9,, 


Xq — xdy 


[Xi,Xo] = dy 


we see that (A.l) is indeed hypoelliptic in Next we will give another, 
important example of vector fields satisfying Hormander’s condition 


Example 4. Let us study Heisenberg group H. Let L be a discrete sub¬ 
group of Heisenberg group El such that EI/L is compact,see e.g. [12] The 
orthonormal frame on is given by the Lie vector helds 

X = dx + ^ydt 

Y = dy- ]^xdt 
Z = dt 

We can easily see that 

[X,Y] = Z. 

Using Theorem we get that the sub-Laplacian 

p= ^(x2 + y2) 

on El/r is hypoelliptic. 


Example 5. One example of hypoelliptic inverse problem is the heat equa¬ 
tion on a compact manifold X = M x M where M is a closed two-dimensional 
manifold. Note that in this paper we have considered the problem on a 
compact manifold, that is, our results are applicable in the case when the 
equation is periodic in time. We are interested in solving the heat sources 
U{x,t) from the noisy measurements Ms{x,t) of temperature T{x,t), that 
is we want to solve U from 


(A.2) {dt-A^)T{x,t) = U{x,t), 

(A.3) Ms{x, t) = {dt - Ax)~^U{x, t) -h SS. 


The operator A = {dt — Ax) ^ is not elliptic but it is hypoelliptic of type 

( 1 , 2 ). 

Such situations arise in non-invasive monitoring. Consider, for example, 
using a thermal camera to record video footage of a car with engine run¬ 
ning. Let us model the metal surface of the car as a compact and closed 
two-dimensional manifold M. The running engine produces heat which we 
observe in the video data. The temperature on the car surface is modelled 
as the solution T{x,t) defined on X = M x M. Equation (A.2) describes 
the conduction of heat along the car surface. The effect of the engine is 
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simply modelled as the heat source term U{x,t)-, recovering U will provide 
information about the state of the engine. 


Appendix B. Computational example 


Since the operator A does not have a continuous inverse operator —)• 
L^, the condition number of the matrix approximation A of the operator 
A grows when the discretisation is refined. This is the very reason why 
regularisation is need in the (numerical) solutions of the inverse problems. 

Next we demonstrate the above results numerically and consider two- 
dimensional deblurring problem on T^, 


M = AU + 5£, 


where £{uj) £ H s > 1 a.s. is normalised white noise and A is elliptic 
operator, smoothing of order 2, 

{Au){x) = -I- |np)“^(J'n)(n))(x). 

The true solution G see Subsection |2.1.2| for the frequentist 

interpretation, is a piecewise linear function presented in Figure We 
choose a priori distribution A(0, Cu) where Cjj = (I—A)“^. Then a random 
draw U{u) from the prior distribution belongs to H'^, where r < 0, with 
probability one. The Cameron-Martin space Zu of a measurable mapping 
U : 0 —)• A is defined by 

Zu = {(I) £ X \ \\(f)\\zu = (C'uV) <(>)x*xX < oo}. 
Cameron-Martin space can also be defined as 


Zu = ^ |y I Y C linear subspace,P({t/ G T}) = l}. 

The approximated solutions Us belongs to Zu and with the chosen a priori 
distribution we have Zu = FF^(T^). 


u 



Figure 1. On the left the original piecewise linear function 
. On the right side the noiseless data = Au^. 

Solving u from Au{x) = m{x) corresponds to the solution of ordinary 
differential equation (1 — d‘^)m(x) = u{x) so A can be thought e.g. as a 
blurring operator. 












30 


HANNE KEKKONEN, MATTI LASSAS AND SAMULI SILTANEN 


The approximated solution to the problem is 

ul = (A*A + - A))-^A*ml 

We get from Subsection |2.1.2| that 


limE^llu^' - U^Whc = 0 

<5-5-0 

when < r < 0. This behaviour can be seen even in numerical simulations 
when the discretisation is fine enough, see Figure In Fi gure [3| we have 
compared the expected convergence rates given in formula (2 .12[ ) in Theo¬ 
rem!^ to the computational convergence rates. In the numerical simulations 
in Figures and we see that for the test case presented in Figure the 
convergence u\ —?• v) in different Sobolev spaces follows well the mean con- 


t| 


4 ^ 

vergence predicted by Theorem 



Figure 2. Normalised errors c(C)||u^ — ul\\fjC(T 2 'j in loga¬ 
rithmic scale with different values of C- We use normalisa¬ 
tion constants c(C) = observe that 

does not converge to G in when (^ > 0. 
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Figure 3. Normalised errors c(C)||rt^ — in log¬ 

arithmic scale with different values of C- The normal¬ 
ized bounds ( |2.12[ ) given in Theorem for the expectations 
ci{C)^\\U— Us\\hi:(j 2 -j are plotted with dashed lines. The nor¬ 
malized errors 0(^)1111^ — for the example given 

in Figure are plotted with solid lines. 
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